The Exam Samachar — Smart News Analyzer is an innovative AI-driven application designed to automatically interpret, summarize, and evaluate news content from uploaded newspapers, images, or PDF documents. Leveraging Vision-based Large Language Models (LLMs), specifically LLaMA 3.2 Vision via the Ollama framework, and Natural Language Processing (NLP) pipelines, the system extracts headlines, identifies publication dates, generates structured summaries, and provides contextual insights into political, social, and regional news stories. A built-in sentiment analysis module categorizes each article as positive, negative, or neutral with polarity confidence scores. The system is deployed on a Streamlit-based interactive interface supporting multi-page PDF uploads, page-wise analysis, and multilingual summary generation. Evaluation on 50 newspaper editions demonstrated headline extraction accuracy of 91.4%, sentiment classification F1-score of 0.87, and an average API response time of 5.3 seconds per page. The system provides a practical Decision Support System (DSS) that transforms raw, unstructured newspaper content into structured, actionable knowledge — making it highly valuable for competitive exam aspirants, researchers, journalists, and educators.
Introduction
Exam Samachar is an AI-powered news analysis system designed to simplify newspaper reading for competitive exam aspirants, researchers, and journalists who struggle to process large volumes of daily news. It automates the extraction, summarization, and sentiment analysis of scanned or PDF newspapers, producing structured page-wise insights in seconds.
The system uses a multimodal pipeline built around LLaMA 3.2 Vision for direct image-based text understanding, removing the need for traditional OCR. Extracted content is then processed through NLP models for abstractive summarization, key insight generation, and sentiment analysis using a hybrid of VADER and DistilBERT.
Key contributions include a fully integrated decision-support system (DSS), page-wise structured output, real-time sentiment scoring, and elimination of manual newspaper curation. It processes multi-column scanned documents effectively and supports both English and Hindi newspapers.
Evaluations on 312 newspaper pages show strong performance:
Headline accuracy: 91.4% (vs 74.2% baseline OCR)
Summarization quality: BERTScore 0.876 (best among compared models)
Sentiment accuracy: 88.5% (F1 = 0.87)
Average processing time: ~5.3 seconds per page (GPU)
User studies indicate high usability and usefulness, confirming that the system significantly reduces time spent on news processing while improving information retention and accessibility.
Conclusion
This paper presented Exam Samachar — Smart News Analyzer, an AI-driven DSS that automates extraction, summarization, and sentiment analysis of newspaper content from scanned PDFs and images. The system integrates LLaMA 3.2 Vision for layout-aware extraction, a VADER + DistilBERT ensemble for sentiment analysis, and a Streamlit interface for intuitive interaction. Empirical evaluation on 312 newspaper pages demonstrated headline accuracy of 91.4%, summarization BERTScore of 0.876, sentiment F1 of 0.87, and average response time of 5.3 seconds — outperforming all baselines considered.
Future work will focus on: (1) live news feed integration; (2) personalized topic feeds and revision history; (3) cross-edition trend analysis and misinformation detection; (4) mobile application development; and (5) domain-specific fine-tuning of the sentiment model on Indian regional news corpora.
References
[1] B. Singh and A. Sharma, \"Current Affairs Preparation Patterns Among Competitive Exam Aspirants in India: A Survey,\" Journal of Educational Research and Practice, vol. 12, no. 3, pp. 45-58, 2023.
[2] R. Smith, \"An Overview of the Tesseract OCR Engine,\" Proc. 9th Int. Conf. on Document Analysis and Recognition (ICDAR), pp. 629-633, 2007.
[3] Y. Xu et al., \"LayoutLM: Pre-training of Text and Layout for Document Image Understanding,\" Proc. ACM SIGKDD, pp. 1192-1200, 2020.
[4] Google Cloud, \"Document AI - Intelligent Document Processing,\" [Online]. Available: https://cloud.google.com/document-ai, 2024.
[5] M. Lewis et al., \"BART: Denoising Sequence-to-Sequence Pre-training for NLG,\" Proc. ACL, pp. 7871-7880, 2020.
[6] C. Raffel et al., \"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,\" JMLR, vol. 21, no. 140, pp. 1-67, 2020.
[7] M. Grusky, M. Naaman, and Y. Artzi, \"Newsroom: A Dataset of 1.3 Million Summaries,\" Proc. NAACL-HLT, pp. 708-719, 2018.
[8] J. Devlin, M. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of Deep Bidirectional Transformers,\" Proc. NAACL-HLT, pp. 4171-4186, 2019.
[9] B. Liu, \"Sentiment Analysis and Opinion Mining,\" Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, Morgan & Claypool, 2012.
[10] R. Nallapati et al., \"Abstractive Text Summarization Using Sequence-to-Sequence RNNs,\" Proc. CoNLL, pp. 280-290, 2016.
[11] A. Islam, S. Akter, and M. Hossain, \"An Adaptive Learning DSS for Current Affairs,\" Expert Systems with Applications, vol. 189, 2022.
[12] Streamlit Documentation, Streamlit Apps and UI Development, [Online]. Available: https://docs.streamlit.io/
[13] Meta AI, LLaMA 3.2 Vision - Multimodal LLM Overview, [Online]. Available: https://ai.meta.com/llama/
[14] C. Lin, \"ROUGE: A Package for Automatic Evaluation of Summaries,\" Proc. ACL Workshop on Text Summarization Branches Out, pp. 74-81, 2004.
[15] T. Zhang et al., \"BERTScore: Evaluating Text Generation with BERT,\" Proc. ICLR, 2020.